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^ Abstract 

■ We design a 3/2 approximation algorithm for the Generahzed Steiner Tree prob- 

t/3 , lem (GST) in metrics with distances 1 and 2. This is the first polynomial time 

^ ' approximation algorithm for a wide class of non-geometric metric GST instances 



> 
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with approximation factor below 2. 



<N : 1 Introduction 

We design a 3/2 approximation algorithm for constructing generalized Steiner trees 
I (Steiner Forests) for metrics with distances 1 and 2. With the exception of geometric 

metrics [5], there were no wide classes of instances known with approximation ratios 
. . ■ better than 2. This was in contrast to similar problems like Traveling Salesman and 

. Steiner Tree Problems jl], [3]. 



2 Definitions and Notation 

A metric with distances 1 and 2 can be represented as a graph with edges being pairs 
of distance 1 and non-edges being pairs of distance 2. We will use GST[1,2] to denote 
the Generalized Steiner Tree Problem restricted to such metrics. 
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The problem instance of GST(1,2) will be a graph G = {V,E) that defines a 
metric in this way, and a collection TZ of subsets of V called required sets. We say 
that Ur^tiR is the set of terminals. In a proper instance, the required sets do not 
overlap and have more than one element. It is obvious that for every family of 
requirements TZ, there exists a unique family propiJZ) that is equivalent and proper. 

A valid solution is a set of unordered node pairs F such that each Ri is contained 
in a connected component of {V^F). The objective is to minimize |i^n£^| + 2\F — E\. 

We will use in the sequel some notation and terminology introduced in [3]. 

A basic building block of our solutions is an s-star consisting of a non-terminal 
c, called the center, s terminals ti, . . . ,ts and edges (c, ti ),..., (c, t^). In [3] we used 
also a more general version of a building block, an (r, s)-comet consisting of a non- 
terminal center c, non-terminal fork nodes /i, • • • , /s plus r + 2s terminals, the center 
is connected to r terminals and all the fork nodes, while each fork node is connected 
to two terminals of its own. 

If s < 3 we say that the star is degenerate, and proper, otherwise. 

In the analysis of our algorithm, we will view its selections as transformations 
of an input instance, so after each phase we have a partial solution and a residual 
instance. We formalize these notions as follows. 

A partition 11 of F induces a graph {Ii,E{Il)) where {A,B) £ E(Il) if {u,v) G E 
for some u £ A,v £ B). We say that (u, v) is a representative of {A, B). 

Similarly, 11 induces required sets. Let i?n = {A £ H : A r\ R ^ 0}, then 
7^^ =prop{{Rn : ReTZ}). 

In our algorithms, we augment initially empty solution F. Edge set F defines par- 
tition n(F) into connected components of {V,F). In a step, we identify a connected 
set A in the induced graph (n(F), E{'n.{F))) and we augment F with representatives 
of edges that form a spanning tree of A. We will call it "collapsing A" , because A 
will become a single node of (n(F), i?(n(F))). 

Thus, if we select some "building block" C, F is going to be augmented by the 
representatives of the edges in C, and this changes the "residual" graph in which we 
make our next selection. For that reason we will use terms "select" and "collapse" 
as synonyms, 

3 Analyzing Greedy Heuristics 

We introduce a new way of analyzing greedy heuristics for our problem, and in 
this section we illustrate it on the example of the Rayward-Smith heuristic [6] for 
STP[1,2]. This heuristic has approximation ratio of exactly 4/3, as demonstrated 
by Bern and Plassman [1]. However, the new analysis method is tighter (see Theo- 
rem [TJ and characterizes the effect of more general classes of greedy choices, as we 
will show in the next section. We have reformulated the Rayward-Smith heuristic as 
follows. 
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While there is more than one terminal 

perform the first possible operation from the following list: 

1. Preprocessing: Collapse an edge between terminals. 

2. Collapsing of stars: Collapse an s-star S with maximum s. 

3. Finishing: Connect two terminals with a non-edge. 

If we can perform a step of Preprocessing, the approximation ratio can only 
improve since such the collapsed edge can be forced into the optimal solution. Thus 
it suffices to analyze the case when no two terminals share a cost-1 connection. 

Let T* be an optimal Steiner tree and let T = T* r\ E be its Steiner skeleton 
consisting of its edges (cost-1 connections), 

Let Tr5 be the Steiner tree given by Rayward-Smith heuristic. We are going to 
prove the following 

Theorem 1 cost{TRs) < cost{T*) + lcost{T). 

In the analysis of the Collapsing of stars and Finishing, we update the following 
three values after each iteration: 

CA = the total cost F, the set of edges collapsed so far, initially, CA = 0; 

CR = the cost of the reference solution Tr^f derived from the optimum solution T* ; 
Tref is a solution of the residual problem in {Vf,Ef); 

P = the sum of potentials distributed among objects, which will be defined later. 

The sum CA + CR + P will be the promised cost, PromCost. 
We will define the potential satisfying the following conditions: 

(a) initially, P < cost{T)/3 < cost{T*)/3; 

(b) after each star collapse, PromCost will be unchanged or decreased. 

(c) at termination, CR = (T^e/ will be empty) and P = 0. 

These properties clearly imply our claim, as the initial Prom,Cost would satisfy 
the statement of the theorem, PromCost cannot increase and at the termination we 
return a solution with that cost. 

Initially, T^e/ = T = T* H E. In the analysis, we also use the skeleton of T^ef, 
'^ref ~ '^ref H E, the Set of 1-cost connections of T^/. The potential is given to the 
following objects: 

• edges of T^^; 

• C-comps which are connected components of T^gf', 

• S-comps which are Steiner full components of T^j. 
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The total potential of edges, C-comps and S-comps is denoted PE, PC and PS 
respectively. At all times, the potential of each edge e € T^^j is p{e) = -j. 
Initially, the potential of each C-comp and S-comp is zero. 

A Steiner tree is called bridgeless if no two Steiner points are adjacent and each 
Steiner point has degree at least 3. 

Lemma 1 Without increasing PromCost, we can transform the optimum solution 
T* into a bridgeless reference solution T^ef, while the new potential p satisfies 

(i) each C-comp C has p{C) > — | and if C has fewer than 3 edges, p{C) = 0; 

(a) each S-comp S has p{S) = 0; 

Proof. Because CA and PS remain zero, to see that PromCost does not increase 
it suffices that each transformation of T^ef and p satisfies ACR + APE + APC < 0. 
The bridgeless Steiner tree is obtained using the following two types of steps. 
Path step. Suppose that T^^^ contains a Steiner point v of degree 2. We remove 
two edges incident to v from T^^^ adding a non-edge (cost-2 connection) to T^ef ■ The 
potential for the both resulting C-comps is set to 0. One can see that ACR = 0, 
APE = — I (two edges removed) and APC < | (APC = | if the component C that 
was split had p{C) = — §• 

If the removal of edges in a Path step creates Steiner points of degree 1, we remove 
them; this can only decrease PromCost. 

Bridge Step. Suppose that we cannot perform a Path step and e € T^^^ is a bridge, 
i.e., an edge e = {u,v) between Steiner points. We remove this edge from T^ej 
(replacing with a non-edge between terminals); this splits a C-comp C into Cq and 
Ci. Each new C-comp has at lest two edges since u and v originally have degree at 
least 3. We set p{Co) = p{C) and p{Ci) = — |. Thus ACR = 1 (the cost is increased 
by 1), APE = — i and APC = — | (one more C-comp with potential — |. 

Note that if we create a C-comp with two edges, we can apply a Bridge Step; this 
is because we assume than there are no edges between terminals. □ 

Prom now on our reference Steiner tree T^/ is assumed to be bridgeless. 
Now we will prove 

Lemma 2 After collapsing an s-star S, s > ?>, conditions (i)-(ii) of Lemma {1\ are 
satisfied and PromCost does not increase. 

Proof. Suppose that the terminals of S be in a C-comps. To break cycles created 
in Tref when we collapse S, we replace s — 1 connections, of which a — 1 are cost-2 
connections between different C-comps and s — a edges within C-comps. 

If this is the entire modification, ACA = s, ACR = —s — a-\-2, APE = —^{s — a) 
(for edges removed from T^^^) while APC < |(a — 1) (for removing potential of a — 1 
C-comps, each — | or 0) hence 

APromCost = s - s - a + 2 - i(s - a) + |(a - 1) = i(4 - s) < 0. 

However, the new C-comp that we create can be trivial; in this case we need to 
increase the estimate of APC by |. If that C-comp had but one edge left, this edge 
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would be removed from T^/ and T^^p which decreases the estimate of ACR by 1 
and ACE by i. If that C -comp had two edges left, we would remove them from T^^j: 
using a Path step, this does not change CR but decreases CE by |. Therefore our 
estimate of PromCost does not increase. □ 

Once we collapsed s-stars for s > 3 we redistribute potential between C-comps 
and S-comps by increasing potential of each nontrivial C-comp by | bringing it to 
— ^ and decreasing potential of one of its S-comps to — ^. This will replace conditions 
(i)-(ii) with 

(i') each C-comp C has p{C) > —\ and each trivial C-comp (with at most one 
edge) has p{C) = 0; 

(ii') each S-comp S has p{S) > — g; 

Lemma 3 After collapsing a 3-star, conditions (i')-(ii') are satisfied and PromCost 
does not increase. 

Proof. Suppose that the terminals of the selected star S belong to 3 different C- 
comps. Then we replace two cost-2 connections from T^j with 3 collapsed edges, 
while we decrease the number of C-comps by 2, thus 

APromCost = AC A + ACR + ARC < 3 - 4 + 2^ = 0. 

Suppose that the terminals of S belong to 2 different C-comps. ACR = 3 because 
we remove one cost-2 connection from T^j and one edge from an S-comp. This S- 
comp becomes a 2-star, hence we remove it from T using a Path Step, so together 
we remove 3 edges from T^^^ and APE = 1. 

One S-comp disappears, so APS = — g. Because we collapse two C-comps into 
one, ARC = —\- Consequently, 

APromCost = 3- 3-3i + i + i<0. 

If the terminals of the selected star belong to a single C-comp and we remove 
2 edges from a single S-comp, we also remove the third edge of this S-comp and 
ACR = — 3, while APE = — 1, APS = g, and if its C-comp degenerates to a single 
node, we have ARC = ^ (otherwise, zero). This yields the same change in PromCost 
as the previous case. 

Finally, if the terminals of the selected star belong to a single C-comp and we 
remove 2 edges from two S-comps, we have AC A + ACR = 1. Because we apply 
Path Steps to those two S-comps, APE = -2. while APS = \ and ARC < ^. Thus 
APromCost is at most — i. □ 

To complete the proof of Theorem [T] it suffices to see that when no more star 
collapsing is possible, T^e/ consists of cost 2-connections, T^g^- = and thus the 
remaining potential is zero. Each finishing step increases CAhy 2 and decreases CR 
by 2, with no changes in PromCost. When we terminate, we have a solution with 
cost CA = PromCost. 
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4 3/2 Approximation for GST with Distances 
1 and 2 

In the heuristic for STP[l,2]we could start with Preprocessing in which we col- 
lapsed every edge (cost-1 connection) between terminals, arguing that such an edge 
can be forced as a component of an optimum solution T*. In GST[l,2]this is no 
longer valid, because this could be an edge between different connected components 
of T* . Indeed, we need to increase our potential and thus PromCost to create a "bud- 
get" for this class of wrong selections: connecting sets that should not be connected. 

Instead, we can start with the following preprocessing that is safe in the context 
of GST[1,2]: 



G-Preprocessing: While there exists an edge or an s-star (with s > 3) 
contained in one of the required sets Ri, collapse it. 



We can also normalize the optimum solution T* to assure these two properties: 
Steiner nodes have degree at least 3 and cost-2 connections connect only pairs of 
terminals from the same required set Ri. Steiner nodes of degree 1 can be obviously 

removed, Steiner nodes of degree 2 and cost-2 connections can be removed, and 
reconncction, if needed, can be achieved by connecting terminals that have to be 
connected. 

Because the terminals of the edge (s-star) selected by G-Preprocessing is surely 
contained in a single connected component of the optimum forest T*, while we in- 
crease CA by 1 (s), we decrease the cost of the T* of the residual problem by 1 (s — 1), 
thus preserving the approximation ratio of 1/1 (s/(s — 1)). 

Thus we can proceed with the assumption that no steps of G-Preprocessing can be 
performed. After the preprocessing, we can perform normalization of the "reference 
tree" T^/ that we initialize with T*. Because T^ef has multiple connected component 
and it may also contain edges between terminals, we introduce two new notions: 

• F-comps which are connected components of the forest Tref, 

• T-comps which are connected components of the subgraph of T^j that is induced 

by the terminals. 

We also introduce the second component of the potential, pg, such that the sum 
of all p(object) and pg(object) does not exceed ^cost{T*). We will use pg to cover 
the cost of connections made between different F-comps, the class of errors that are 
specific to the generalized problem. 

We can give Pg{e) = | for every edge of T* fl E, for edges inside a T-comp we 
can increase it to Pg{e) = ^. For each non-edge e' in T* we can give Pg{e') = 1. 
Moreover, to each initial C-comp C we can give Pg{C) = — |- Let Pg{F) be the sum 
of Pg potentials of objects contained in an F-comp F. 

We can define PromCosi! = PromCost + PF where PF is the sum of all pg{Fys. 
Our goal is to build a solution by collapsing selected connections without increasing 
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PromCost! . When we make a connection within an F-comp, we do not increase 
PromCost and PF does not increase either. 

When we make a selection that connects two F-comps, say Fi,F2 into F = F1UF2, 
we can cover the cost of that connection using pg{Fi), and F can use Pg{F2) for a 
future connection with another F-comp. Because we will not connect distinct F-comps 
with non-edges, such a connection costs at most | (this is the cost of connections 
made by a 3-stars, larger stars and edges make connections with a smaller cost). This 
is safe if Pg{Fi) > |. 

Lemma 4 If a required set of terminals Ri has more than 2 nodes, it is contained 
in an initial F-comp F such that Pg{F) > |. 

Proof. Tree F may contain three kinds of connections: e is a T-connection if it is 
an edge between terminals, and Pg{e) = |, a 2-connection if it is a non-edge, and 
Pg{e) = 1 and a C-connection, any other edge, and Pg(e) = g. 

If F contains a C-connection, it contains a Steiner node, and thus at least 3 C- 
connections and a C-comp; those objects alone give of 3g -|- | = | — If there are 
no other connections in F, it is a 3-star, but in this case all terminals of that star are 
in Ri and we would collapse it in G-Preprocessing. And any other connection would 
increase pg (F) to at least | . 

Now we assume that F does not contain C-connections. If it contains a 2- 
conncction, it must contain another connection as well, and the least possible Pg{F) 
is 1 + ^ if this other connection is a T-connection. In the remaining case, F has some 
a terminals, a — 1 T-connections and Pg{F) = (a — 1)^. Again, if a > 3 then Pg{F) is 
sufficiently high and if a = 3 then only Ri are terminals of F and the T-connections 
would be collapsed in G-preprocessing. □ 

We can also observe that 

Lemma h If a required set of terminals Ri has 2 nodes, it is contained in an initial 
F-comp F such that Pg{F) > 1. 

For this reason, it always safe to collapse edges between terminals. However, the 
status of the resulting merged sets of terminals requires some reasoning. Let us say 
that a set of terminals F is safe if it has Pg{F) > |. When we merge two sets of 
terminals, Fi and F2 using a connection with cost c, the union F = FiU F2 will get 
Pg{F) = Pg{Fi) + pg{F2) — c. If c = 1, then union is safe as long as at least one 
of F\,F2 is safe, but not otherwise. However, suppose that after a union creating 
a larger unsafe set F an edge (of the residual graph) is contained in F. Then the 
balance of F is more favorable, by 1, then our pessimistic reasoning that deemed F 
unsafe, and this suffices to tag is safe. 

Thus we can perform a bit bolder preprocessing if we keep track which resulting 
requirement sets are safe, and which are not. 
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GE-Preprocessing (G-preprocessing, extended version): Tag each re- 
quired set of terminals Ri as safe if > 2 and unsafe otherwise. While 
you can, do the following: collapse s-star contained in a required set 
and collapse any edge between two terminals. In the latter case, if these 
two terminals were in two different required sets and thus the collapsing 
replaces them with their union, tag the union safe if at least one of the 
merged requirement was safe. Moreover, if the collapsed edge is contained 
in some requirement set, tag that set safe. 



We are now left with the problem: what to do with the unsafe sets that remain 
after the GE-Preprocessing. To address this problem, we need a stronger version of 
Lemma [TJ 

Lemma 6 Without increasing PromCost we can transform the optimum solution T* 
into a reference solution that satisfies the conditions of Lemma [7] and in which each 
T-comp has a cost-1 connection to at most one Steiner point. 

Proof. The reasoning is the same as in Lemma [H except that we need to perform 
Bridge Step in the situation when we have a T-comp connected to more then one 
Steiner node. If we cannot remove such a connection as a Path step, we break a 
C-comp into two, so each of the resulting parts has at least two edges adjacent to 
Steiner nodes (the sufficient premise for reasoning of the Bridge Step). □ 

Now we can justify 



Annihilation of unsafe sets: After GE-Preprocessing, break each 
unsafe set of requirements into original requirements, connect them in- 
dividually with cost-2 connections and remove from further consideration. 



Lemma 7 Annihilation of unsafe sets does not increase PromCost! . 

Proof. Consider an unsafe requirement R' . It is created from a union of some p 
pair requirements Ri, . . . , Rp (each with two terminals). Because R' remains unsafe, 
GE-preprocessing did not collapsed exactly p — 1 1-cost connections inside R' , so it 
consists of p + 1 connected components (T-comps). We increase CA by p -|- 1 by 
replacing these p — 1 connections with p non-edges. 

A pair Ri that is connected separately in T^e/ contributes 3 to PromCost! and 
after annihilation uses only the correct cost, 2, so this case has a surplus. One can 
see that the most tight case is when in Tfe/ every T-comp of R' is connected by an 
edge to some Steiner node (a connection to a terminal would be performed already). 
Thus we remove those connections and decrease CRhy p+1, remove the connections 
made by GE-Preprocessing and decrease C Ahy p — 1 and reconnect with p non-edges; 
this does not change CA + CR, while the sum of potentials can only decrease. □ 
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We conclude that after the Annihilation of unsafe sets we can proceed with 
the heuristic described in the previous section without increasing PromCosif, and 
PromCosif is initialized as not larger than ^cost{T*). 

We construct now our approximation algorithm to consist of GE-Preprocessing 
followed by Annihilation of unsafe sets and followed by Rayward-Smith heuristics. 

With the above we have the following main result. 

Theorem 2 There exists a polynomial time 3/ 2- approximation algorithm for the 
GST[1,2]. 
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